Using Filegroups
All databases have a primary filegroup that contains
the primary data file. There can be only one primary filegroup. If you
don’t create any other filegroups or change the default filegroup to a
filegroup other than the primary filegroup, all files will be in the
primary file group unless specifically placed in another filegroup.
In addition to the primary filegroup, you can add one
or more filegroups to the database, and a filegroup can contain one or
more files. The main purpose of using filegroups is to provide more
control over the placement of files and data on your server. When you
create a table or index, you can map it to a specific filegroup, thus
controlling the placement of data. A typical SQL Server database
installation generally uses a single RAID array to spread I/O across
disks and create all files in the primary filegroup; more advanced
installations or installations with very large databases spread across
multiple array sets can benefit from the finer level of control of file
and data placement afforded by additional filegroups.
For example, for a simple database such as AdventureWorks,
you can create just one primary file that contains all data and objects
and a log file that contains the transaction log information. For a
larger and more complex database, such as a securities trading system
where large data volumes and strict performance criteria are the norm,
you might create the database with one primary file and four additional
secondary files. You can then set up filegroups so you can place the
data and objects within the database across all five files. If you have a
table that itself needs to be spread across multiple disk arrays for
performance reasons, you can place multiple files in a filegroup, each
of which resides on a different disk, and create the table on that
filegroup. For example, you can create three files (Data1.ndf, Data2.ndf, and Data3.ndf) on three disk arrays, respectively, and then assign them to the filegroup called spread_group. Your table can then be created specifically on the filegroup spread_group. Queries for data from the table are spread across the three disk arrays, thereby improving I/O performance.
If a filegroup contains more than one file, when
space is allocated to objects stored in that filegroup, the data is
stored proportionally across the files. In other words, if you have one
file in a filegroup with twice as much free space as another, the first
file has two extents allocated from it for each extent allocated from
the second file .
Listing 2 provides an example of using filegroups in a database to control the file placement of the customer_info table.
Listing 2. Using a Filegroup to Control Placement for a Table
CREATE DATABASE Customer
ON ( NAME='Customer_Data',
FILENAME='C:\SQLData\Customer_Data1.mdf',
SIZE=50,
MAXSIZE=100,
FILEGROWTH=10)
LOG ON ( NAME='Customer_Log',
FILENAME='C:\SQLData\Customer_Log.ldf',
SIZE=50,
FILEGROWTH=20%)
GO
ALTER DATABASE Customer
ADD FILEGROUP Cust_table
GO
ALTER DATABASE Customer
ADD FILE
( NAME='Customer_Data2',
FILENAME='G:\SQLData\Customer_Data2.ndf',
SIZE=100,
FILEGROWTH=20)
TO FILEGROUP Cust_Table
GO
USE Customer
CREATE TABLE customer_info
(cust_no INT, cust_address NCHAR(200), info NVARCHAR(3000))
ON Cust_Table
GO
|
The CREATE DATABASE statement in Listing 34.2 creates a database with a primary database file and log file. The first ALTER DATABASE statement adds a filegroup. A secondary database file is added with the second ALTER DATABASE command. This file is added to the Cust_Table filegroup. The CREATE TABLE statement creates a table; the ON Cust_Table clause places the table in the Cust_Table filegroup (the Customer_Data2 file on the G: disk partition).
The sys.filegroups system catalog view contains information about the database filegroups defined within a database, as shown in Table 2.
Table 2. The sys.filegroups System Catalog View
Column Name | Description |
---|
name | Name of the data space, unique within the database. |
data_space_id | Data space ID number, unique within the database. |
type | FG = Filegroup. |
type_desc | Description of data space type: ROWS_FILEGROUP. |
is_default | 1
= This is the default data space. The default data space is used when a
filegroup or partition scheme is not specified in a CREATE TABLE or
CREATE INDEX statement.
0 = This is not the default data space. |
filegroup_guid | GUID for the filegroup. |
| NULL = PRIMARY filegroup. |
log_filegroup_id | Not used; value is NULL. |
is_read_only | 1 = Filegroup is read-only.
0 = Filegroup is read/write. |
The following statement returns the filename, size in
megabytes (not including autogrow), and the name of the filegroup to
which each file belongs:
SELECT
convert(varchar(30), sf.name) as filename,
size/128 as size_in_MB,
convert(varchar(30), sfg.name) as filegroupname
FROM sys.database_files sf
INNER JOIN sys.filegroups sfg
ON sf.data_space_id = sfg.data_space_id
go
filename size_in_MB filegroupname
------------------------------ ----------- -------------------------
Customer_Data 50 PRIMARY
Customer_Data2 100 Cust_table
FILESTREAM Filegroups
FILESTREAM
storage is a new feature in SQL Server 2008 for storing unstructured
data, such as documents, images, and videos. FILESTREAM storage helps to
solve the issues with using unstructured data by integrating the SQL
Server Database Engine with the NTFS file system for storing the
unstructured data, such as documents and images, on the file system with
the database storing a pointer to the data. Although the actual data
resides outside the database in the NTFS file system, you can still use
Transact-SQL (T-SQL) statements to insert, update, query, and back up
FILESTREAM data, while maintaining transactional consistency between the
unstructured data and corresponding structured data with same level of
security.
Note
To use FILESTREAM storage, you must first enable
FILESTREAM storage at the Windows level as well as at the SQL Server
instance level. You can enable FILESTREAM at the Windows level during
installation of SQL Server 2008 or at any time using SQL Server
Configuration Manager. After you enable FILESTREAM at the Windows level,
you next need to enable FILESTREAM for the SQL Server instance. You can
do this either through SQL Server Management Studio (SSMS) or via
T-SQL.
After you enabled FILESTREAM for the SQL Server
instance, you can enable it for a database by creating a FILESTREAM
filegroup. You can do this when the database is created (or to an
existing database) by adding a filegroup and including the CONTAINS FILESTREAM
clause. Unlike regular filegroups, a FILESTREAM filegroup can contain
only a single file reference, which is actually a file system folder
rather than an actual file. The actual folder must not exist (although
the path up to the folder must exist); SQL Server creates the filestream folder. For example, in Listing 3, the code adds a FILESTREAM filegroup called CustFSGroup and adds the folder G:\SQLData\custinfo_FS into the file group. This custinfo_FS folder is created by SQL Server in the G:\SQLData folder.
Listing 3. Using a Filegroup to Control Placement for a Table
ALTER DATABASE Customer
ADD FILEGROUP Cust_FSGroup CONTAINS FILESTREAM
ALTER DATABASE Customer
ADD FILE
( NAME=custinfo_FS,
FILENAME = 'G:\SQLData\custinfo_FS')
to FILEGROUP Cust_FSGroup
GO
|
If you look in the G:\SQLData\custinfo_FS folder, you should see a Filestream.hdr file and an $FSLOG folder. The Filestream.hdr file is a FILESTREAM container header file that should not be moved or modified.
As you can see in the example in Listing 3,
for FILESTREAM files or file groups, unlike regular files, you do not
specify size or growth information. No space is preallocated. The file
and filegroup grow as data is added to tables that have been created
with FILESTREAM columns.
As you create tables with FILESTREAM columns, a
subfolder is created in the filegroup folder for each table. The
filenames are GUIDs. Each FILESTREAM column created in the table results
in another subfolder created under the table subfolder. The column
subfolder name is also a GUID. At this point, there still are no actual
files created. That happens after you start adding rows to the table. A
file is created in the column subfolder for each row inserted into the
table with a non-NULL value for the FILESTREAM column.